class: center, middle, inverse, title-slide .title[ # APEC8211: Recitation 2 ] .author[ ### Shunkei Kakimoto ] --- class: middle <style type="text/css"> .small-code .remark-code{ font-size: 50% } .medium-code .remark-code{ font-size: 80% } .xlarge { font-size: 150% } .large { font-size: 130% } .medium { font-size: 80% } .small { font-size: 70% } .xsmall { font-size: 50% } .my-one-page-font { font-size: 30px; } .remark-slide-number { display: none; } .remark-slide-content.hljs-github h1 { margin-top: 5px; margin-bottom: 25px; } .remark-slide-content.hljs-github { padding-top: 10px; padding-left: 30px; padding-right: 30px; } .panel-tabs { <!-- color: #062A00; --> color: #841F27; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; padding-bottom: 0px; } .panel-tab { margin-top: 0px; margin-bottom: 0px; margin-left: 3px; margin-right: 3px; padding-top: 0px; padding-bottom: 0px; } .panelset .panel-tabs .panel-tab { min-height: 40px; } .remark-slide th { border-bottom: 1px solid #ddd; } .remark-slide thead { border-bottom: 0px; } .gt_footnote { padding: 2px; } .remark-slide table { border-collapse: collapse; } .remark-slide tbody { border-bottom: 2px solid #666; } .important { background-color: lightpink; border: 2px solid blue; font-weight: bold; } .remark-code { display: block; overflow-x: auto; padding: .5em; background: #ffe7e7; } .remark-code, .remark-inline-code { font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;font-size: 90%; } .hljs-github .hljs { background: #f2f2fd; } .remark-inline-code { padding-top: 0px; padding-bottom: 0px; background-color: #e6e6e6; } .r.hljs.remark-code.remark-inline-code{ font-size: 0.9em } .left-full { width: 80%; float: left; } .left-code { width: 38%; height: 92%; float: left; } .right-plot { width: 60%; float: right; padding-left: 1%; } .left6 { width: 60%; height: 92%; float: left; } .left5 { width: 49%; <!-- height: 92%; --> float: left; } .right5 { width: 49%; float: right; padding-left: 1%; } .right4 { width: 39%; float: right; padding-left: 1%; } .left3 { width: 29%; height: 92%; float: left; } .right7 { width: 69%; float: right; padding-left: 1%; } .left4 { width: 38%; float: left; } .right6 { width: 60%; float: right; padding-left: 1%; } ul li{ margin: 7px; } ul, li{ margin-left: 15px; padding-left: 0px; } ol li{ margin: 7px; } ol, li{ margin-left: 15px; padding-left: 0px; } </style> <style type="text/css"> .content-box { box-sizing: border-box; background-color: #e2e2e2; } .content-box-blue, .content-box-gray, .content-box-grey, .content-box-army, .content-box-green, .content-box-purple, .content-box-red, .content-box-yellow { box-sizing: border-box; border-radius: 5px; margin: 0 0 10px; overflow: hidden; padding: 0px 5px 0px 5px; width: 100%; } .content-box-blue { background-color: #F0F8FF; } .content-box-gray { background-color: #e2e2e2; } .content-box-grey { background-color: #F5F5F5; } .content-box-army { background-color: #737a36; } .content-box-green { background-color: #d9edc2; } .content-box-purple { background-color: #e2e2f9; } .content-box-red { background-color: #ffcccc; } .content-box-yellow { background-color: #fef5c4; } .content-box-blue .remark-inline-code, .content-box-blue .remark-inline-code, .content-box-gray .remark-inline-code, .content-box-grey .remark-inline-code, .content-box-army .remark-inline-code, .content-box-green .remark-inline-code, .content-box-purple .remark-inline-code, .content-box-red .remark-inline-code, .content-box-yellow .remark-inline-code { background: none; } .full-width { display: flex; width: 100%; flex: 1 1 auto; } </style> <style type="text/css"> blockquote, .blockquote { display: block; margin-top: 0.1em; margin-bottom: 0.2em; margin-left: 5px; margin-right: 5px; border-left: solid 10px #0148A4; border-top: solid 2px #0148A4; border-bottom: solid 2px #0148A4; border-right: solid 2px #0148A4; box-shadow: 0 0 6px rgba(0,0,0,0.5); /* background-color: #e64626; */ color: #e64626; padding: 0.5em; -moz-border-radius: 5px; -webkit-border-radius: 5px; } .blockquote p { margin-top: 0px; margin-bottom: 5px; } .blockquote > h1:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h2:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h3:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h4:first-of-type { margin-top: 0px; margin-bottom: 5px; } .text-shadow { text-shadow: 0 0 4px #424242; } </style> <style type="text/css"> /****************** * Slide scrolling * (non-functional) * not sure if it is a good idea anyway slides > slide { overflow: scroll; padding: 5px 40px; } .scrollable-slide .remark-slide { height: 400px; overflow: scroll !important; } ******************/ .scroll-box-8 { height:8em; overflow-y: scroll; } .scroll-box-10 { height:10em; overflow-y: scroll; } .scroll-box-12 { height:12em; overflow-y: scroll; } .scroll-box-14 { height:14em; overflow-y: scroll; } .scroll-box-16 { height:16em; overflow-y: scroll; } .scroll-box-18 { height:18em; overflow-y: scroll; } .scroll-box-20 { height:20em; overflow-y: scroll; } .scroll-box-24 { height:24em; overflow-y: scroll; } .scroll-box-30 { height:30em; overflow-y: scroll; } .scroll-output { height: 90%; overflow-y: scroll; } </style> # Outline Review some concepts related to random variables <!-- # main --> 1. CDF, PDF, PMF [(review)](#dist) + Exercise problem 1 [(here)](#ex1) + Exercise problem 2 (optional) [(here)](#ex2) <!-- # To explain Jensen's inequality --> 2. Mean and variance and covariance [(review)](#mean) + Exercise problem 3 [(here)](#ex3) + Exercise problems 4 (optional) [(here)](#ex4) 3. Introduction of Monte Calro Simulation [(here)](#monte) 4. Jensen's inequality (Optional) [(here)](#jensen) --- class: inverse, center, middle name: dist # CDF, PDF, and PMF <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> --- .content-box-red[**Distribution function**] + Cumulative distribution function (CDF) + **Definition:** <span style="color:red">The CDF of a random variable `\(X\)` is `\(F(x) = Pr[X \leq x]\)`</span> + **Verbally**: CDF `\(F(x)\)` tells us the probability of the event that random variable `\(X\)` is <span style="color:red">less</span> than a value `\(x\)`. .left5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-5-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> ] --- <!-- PDF and PMF (1) --> .content-box-red[**Probability mass function (Discrete random variables)**] + **Definition**: `\(\color{red}{\pi(x) = Pr[X = x]}\)` + **Verbally**: The probability that `\(X\)` equals the value `\(x\)` <br> .content-box-red[**Probability density function (Continuous random variables)**] + **Definition**: `\(\color{red}{f(x) = \frac{d}{dx}F(x)} \quad ( = \displaystyle \lim_{h\to\infty} \frac{F(x+h)-F(x)}{h})\)` + **Verbally**: Density function is defined as a very small change in the CDF (or the probability of the random variable falling within a particular range of values according to [wikipedia](https://en.wikipedia.org/wiki/Probability_density_function)). --- <!-- PDF and PMF (2) --> .content-box-red[**Probability mass function (Discrete random variables)**] + **Definition**: `\(\color{red}{\pi(x) = Pr[X = x]}\)` + **Verbally**: The probability that `\(X\)` equals the value `\(x\)` <br> .content-box-red[**Probability density function (Continuous random variables)**] + **Definition**: `\(\color{red}{f(x) = \frac{d}{dx}F(x)} \quad ( = \displaystyle \lim_{h\to\infty} \frac{F(x+h)-F(x)}{h})\)` + **Verbally**: Density function is a very small change in the CDF (or the probability of the random variable falling within a particular range of values according to [wikipedia](https://en.wikipedia.org/wiki/Probability_density_function)). <br> .content-box-red[**Theorem 2.3: Properties of a PDF**] A function f(x) is a density function **if and only if** `$$\begin{cases} f(x) \ge 0 \text{ for all } x \\ \int_{-\infty}^\infty f(x)\,dx = 1 \end{cases}$$` + You can use this condition to check whether a function is valid density function is or not! <!-- if you are asked to show that a function f(x) is a valid density function, check whether f(x) satisfies these properties or not. --> --- class: middle .content-box-green[**Relationship between CDF and PDF**] + **From CDF to PDF**: `\(f(x) = \frac{d}{dx}F(x)\)` </br> (by definition of PDF) + **From PDF to CDF**: `\(F(x) = Pr(X \leq x) = \int_{-\infty}^x f(t) dt\)` </br> (as shown below) <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-25-1.gif" width="80%" style="display: block; margin: auto;" /> --- name: ex1 # Exercise 1 .content-box-green[**Final Exam: 2021: Problem 1**] Define `\(\Phi(z)\)` as the CDF of a standard normal random variable and `\(\phi(z)\)` as its density function. (a) Write `\(Pr(Z \leq b)\)` using `\(\Phi()\)`. (b) Write `\(Pr(Z \leq b)\)` as an integral. (c) Write `\(Pr(a \leq Z \leq b)\)` using `\(\Phi()\)`. (d) Write `\(Pr(a \leq Z \leq b)\)` as an integral. --- # Exercise 2 .content-box-green[**PSE Exercise 2.1**] Let `\(X \sim U[0,1]\)`. Find the PDF of random variable `\(Y=X^2\)`. --- class: inverse, center, middle name: mean # Mean and Variance and Covariance <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> --- .content-box-red[**Mean and Variance**] **Definition** 2.18, 2.19: + The mean of `\(X\)` is<span style='color:red'> `\(E[X]\)`</span> + The variance of `\(X\)` is <span style='color:red'> `\(Var[X]=E[(X-E[X])^2]\)`</span> `\(= E[X^2] - (E[X])^2\)` .left5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .content-box-red[**Covariance**] **Definition** $$ \color{red}{Cov(X, Y) = E[(X-E[X])((Y-E[Y]))]} = E[XY] - E[X][Y] $$ **Verbally**: Covariance measure the joint variability of two random variables. + .content-box-green[Visualization] <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> --- .content-box-red[**Covariance**] **Definition:** $$ \color{red}{Cov(X, Y) = E[(X-E[X])((Y-E[Y]))]} = E[XY] - E[X][Y] $$ **Verbally**: Covariance measure the joint variability of two random variables. + .content-box-green[Visualization] <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> .content-box-red[**Correlation**] **Definition:** $$ \color{red}{Corr(X, Y) = \frac{Cov(X,Y)}{\sqrt{Var[X] Var[Y]}}} $$ --- ## Play around with data (optional) .content-box-green[**Goal**] + Understand some basic R functions (e.g., mean, variance, etc.) + See covariance is influenced by the change in scale but correlation is not. .small-code[ ```r # === Data === # data(airquality) ``` ```r #/*--------------------------------*/ #' ## Basic functions of R #/*--------------------------------*/ # --- histogram of Temperature (Temp) --- # hist(airquality$Temp) # frequency table can be obtained by running table(airquality$Temp) # --- Mean of Temp (degrees F)--- # mean(airquality$Temp) # --- Variance of Temp --- # var(airquality$Temp) # sd(airquality$Temp) for standard deviation # --- summary statistics of Temp --- # summary(airquality$Temp) ``` ```r #/*------------------------------------------*/ #' ## Relationship between Wind and Temp #/*------------------------------------------*/ # === Covariance === # cov(airquality$Wind, airquality$Temp) # What happens if you change the unit of wind from mph to kmph (1mph=1.6kmph) cov(airquality$Wind*1.6, airquality$Temp) # === Correlation === # cor(airquality$Wind, airquality$Temp) # What happens if you change the unit of wind from mph to kmph (1mph=1.6kmph) cor(airquality$Wind*1.6, airquality$Temp) ``` ] --- ## E[ ], Var[ ] as operators .content-box-red[**Expectation: E[ ]**] <span style="color:red"> 1. `\(E[ \,]\)` is a linear operator (Linearity of expectation)</span> For any constants `\(a\)` and `\(b\)`, `$$E[a+bX] = a + bE[X]$$` `$$E[aX+bY] = aE[X] + bE[Y]$$` .content-box-green[**Question**] + `\(E[X+Y^2] = E[X]+E[Y^2]\)`? + `\(E[XY] = E[X]E[Y]\)`? --- ## E[ ], Var[ ] as operators .content-box-red[**Expectation: E[ ]**] <span style="color:red"> 1. `\(E[ \,]\)` is a linear operator (Linearity of expectation)</span> For any constants `\(a\)` and `\(b\)`, `$$E[a+bX] = a + bE[X]$$` `$$E[aX+bY] = aE[X] + bE[Y]$$` .content-box-green[**Question**] + `\(E[X+Y^2] = E[X]+E[Y^2]\)`? * Yes. + `\(E[XY] = E[X]E[Y]\)`? * No. This holds if and only if `\(X\)` and `\(Y\)` are independent. (Proof?) --- ## E[ ], Var[ ] as operators .content-box-red[**Expectation: E[ ]**] <span style="color:red"> 1. `\(E[ \,]\)` is a linear operator (Linearity of expectation)</span> For any constants `\(a\)` and `\(b\)`, `$$E[a+bX] = a + bE[X]$$` `$$E[aX+bY] = aE[X] + bE[Y]$$` .content-box-green[**Question**] + `\(E[X+Y^2] = E[X]+E[Y^2]\)`? * Yes. + `\(E[XY] = E[X]E[Y]\)`? * No. This holds if and only if `\(X\)` and `\(Y\)` are independent. (Proof?) <span style="color:red"> 2. Law of iterated expectation</span> $$ E[E[Y|X]] = E[Y] $$ --- ## E[ ], Var[ ] as operators .content-box-red[**Variance: Var[ ]**] `\(Var[ \,]\)` is <span style="color:red">not</span> a linear operator $$ Var[a+bX] = b^2E[X] $$ because `$$\begin{align*} Var[a+bX] &= E[(a+bX - E[a+bX])^2] \quad &(\text{definition of variance})\\ &= E[(a+bX - a-bE[X])^2] \quad &(\text{linearity of expectation})\\ &= E[(b(X - E[X]))^2] \\ &= E[b^2(X - E[X])^2] \\ &= b^2 (X - E[X])^2 \\ &= b^2 Var[X] \end{align*}$$` --- name: ex3 # Exercise 3 .content-box-green[**Lecture note 2, p14**] Prove these for continuous `\((X,Y)\)` with finite variances. (a). If `\(E[X]=0\)` or `\(E[Y]=0\)`, `\(Cov(X,Y)=E[XY]\)`. (b). If `\(X \perp\!\!\!\perp Y\)`, `\(corr(X,Y)=0\)`. (c). If `\(E[X] = E[Y] = 0\)`, `\(Var[X+Y] = Var[X] + Var[Y] + 2Cov(X,Y)\)` (Note: Also true if the expectations are non-zero). (d). If `\(X\)` and `\(Y\)` are uncorrelated, `\(Var[X+Y] = Var[X] + Var[Y]\)`. --- name: ex4 # Exercise 4 .content-box-green[**Final Exam: 2021: Problem 3**] The chi-squared distribution with `\(k\)` degrees of freedom, denoted `\(\chi^2(k)\)`, is the distribution of `\(\sum_{i=1}^k Z^2_{i}\)` and the `\(Z_i\)` are independent `\((Z_i \perp\!\!\!\perp Z_j)\)`. *You do not need to work with the CDF or density of a `\(\chi^2\)` distribution to answer this question!*. <br> (a) Show that if `\(X\)` is distributed `\(\chi^2(k)\)` then `\(E[X]=k\)`. <br> (b) More work with expectation Let `\(K=Z^2_{1} + Z^2_{2}\)`, where `\(Z_j \sim N(0,1)\)`. Then `\(K \sim \chi^2(2)\)`. Another fact is that if `\(Z \sim N(0,1)\)`, then `\(E[Z^4]=3\)`. Use that fact to show that `\(Var[K]=4\)`. [Hint: `\(E[Z_j^4]\)` is closely related to `\(Var[Z_j^2]\)`.] --- class: inverse, center, middle name: intro # Monte Calro Simulation: Brief introduction <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> --- class: middle .content-box-red[**Monte Carlo Simulation**] + A way to test econometric theories or statistical procedures in realistic setting via simulation. --- class: middle .content-box-green[**Example: Binomial distribution (PSE 3.4)**] The binomial disrandom variables equals the outcome of `\(n\)` independent Bernoulli trials. If you flip a coin `\(n\)` times, the number of heads has a binomial distribution. <br> Theoretically, the binomial random variable has a binomial distribution: `$$\begin{align*} E[X] &= np \\ Var[X] &= np(1-p) \end{align*}$$` <br> --- class: middle .content-box-green[**Example: Binomial distribution (PSE 3.4)**] Suppose that we flip a coin `\(n=9\)` times, and count the number of heads (i.e. `\(X\)`). The coin is not fair, `\(p=Pr[heads]=\frac{1}{3}\)`. According to the theory, + `\(E[X]=np = 9 \times \frac{1}{3} = 3\)` + `\(Var[X]=np(1-p) = 9 \times \frac{1}{3} (1 - \frac{1}{3}) = 2\)` Can we confirm this with Monte Calro simulation? --- class: middle ## Monte Carlo Simulation: Steps + step 1: specify the data generating process + step 2: Repeat: * step 2.1: generate data based on the data generating process * step 2.2: get an outcome you are interested in based on the generated data + step 3: compare your estimates with the true parameter <br> In our case, the outcome is the number of heads ($X$), we use it to "estimate" true parameters <span style="color:blue">$E[X]=3$</span> and <span style="color:blue">$Var[X]=2$</span>. --- ## For explanation: A single iteration .medium-code[ ```r set.seed(1234) # --- Step1: Speficify the data generating process --- # p <- 1/3 # the probability of getting heads n <- 9 # the number of trials # --- Step 2.1: generate data --- # seq_x <- sample(c(1,0), size=n, prob=c(p, 1-p), replace=TRUE) seq_x ``` ``` ## [1] 0 0 0 0 1 0 0 0 0 ``` ```r # --- Step 2.2: get an outcome you are interested in --- # sum(seq_x) # the number of heads ``` ``` ## [1] 1 ``` ] --- ## Repeat the calculation many many times ```r B <- 1000 # the number of iterations # create a storage that stores outcomes storage <- numeric(B) # --- step 2: Run the simulation --- # for(i in 1:B){ seq_x <- sample(c(1,0), size=9, prob = c(p, 1-p), replace=TRUE) storage[i] <- sum(seq_x) } # --- Step 4: compare the results with --- # # Mean mean(storage) ``` ``` ## [1] 2.984 ``` ```r # Variance var(storage) ``` ``` ## [1] 1.831576 ``` --- class: inverse, center, middle name: jensen # Jensen's inequality (optional) <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> --- ## Motivation Linearity of expectation cannot be used when the function inside `\(E[\,]\)` is a nonlinear function. .content-box-green[**Example:**] + If `\(g(X)\)` is a linear function (e.g., `\(g(x)=ax+b\)`) * `\(E[g(X)]=g(E[X]])\)` + If `\(g(X)\)` is a nonlinear function (e.g., `\(g(x)=x^2\)`) * `\(E[g(X)] \neq g(E[X])\)` --- (This is not the proof) In the previous slide, we saw `\(Var[X]=E[(X-E[X])^2]=E[X^2] - (E[X])^2\)`. Because `\(Var[X] \ge 0\)`, `$$E[X^2] - (E[X])^2 \ge 0$$` <p style="text-align: center;">or</p> `$$(E[X])^2 \leq E[X^2]$$` Define `\(g(x)=x^2\)`. Then it is written as `$$g(E[X]) \leq E[g(X)]$$` Generally, `$$\begin{align*} g(E[X]) \leq E[g(X)] \quad &\text{if } g(x) \text{ is a convex function} \\ E[g(X)] \leq g(E[X]) \quad &\text{if } g(x) \text{ is a concave function} \end{align*}$$` + So, the first step to use Jensen's inequality is to check whether `\(g(x)\)` is concave or convex function. --- .content-box-green[**Visualization**] .panelset[ .panel[.panel-name[Example 1 : g(x) is convex] .left5[ Suppose that `\(g(x)=x^2\)`. .small-code[ ```r set.seed(356) # Create a sequence of X from a uniformal distribution x <- runif(1000, 0, 10) # /*===== Convex case: g(X)=X^2 =====*/ y <- x^2 figure_ex1 <- ggplot()+ geom_point(aes(x = x, y = y))+ # --- E[X] --- # geom_vline(xintercept = mean(x), color = "red", linetype = "dashed")+ annotate("text", x = mean(x)+1, y = 0.01, label = paste0("E[X]=", round(mean(x), 1)), size = 3, color = "red") + # --- Add horizontal line for --- # geom_hline(yintercept = mean(y), color="blue", linetype = "dashed")+ annotate("text", x = 1, y = mean(y)+5, label = paste0("E[g(X)]=", round(mean(y), 1)), size = 3, color = "blue") + # --- Add horizontal line for g(E[X]) --- # geom_hline(yintercept = mean(x)^2, color="darkgreen", linetype = "dashed")+ annotate("text", x = 1, y = mean(x)^2-5, label = paste0("g(E(X))=", round(mean(x)^2, 1)), size = 3, color = "darkgreen") + theme_bw() ``` ] ] .right5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" /> `$$\color{darkgreen}{g(E[X])} \leq \color{blue}{E[g(X)]}$$` ] ] .panel[.panel-name[Example 2: g(x) is concave] .left5[ Suppose that `\(g(x)=\sqrt{x}\)`. .small-code[ ```r # /*===== Convex case: g(X)=X^(1/2) =====*/ y <- x^(1/2) figure_ex2 <- ggplot()+ geom_point(aes(x = x, y = y))+ # --- E[X] --- # geom_vline(xintercept = mean(x), color = "red", linetype = "dashed")+ annotate("text", x = mean(x)+0.8, y = 0.01, label = paste0("E[X]=", round(mean(x), 1)), size = 3, color = "red") + # --- E[g(X)] --- # geom_hline(yintercept = mean(y), color = "blue", linetype = "dashed")+ annotate("text", x = 1, y = mean(y)-0.2, label = paste0("E[g(X)]=", round(mean(y), 2)), size = 3, color = "blue") + # --- g(E[X]) --- # geom_hline(yintercept = mean(x)^(1/2), color = "darkgreen", linetype = "dashed")+ annotate("text", x = 1, y = mean(x)^(1/2)+0.2, label = paste0("g(E(X))=", round(mean(x)^(1/2), 2)), size = 3, color = "darkgreen") + theme_bw() ``` ] ] .right5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> `$$\color{blue}{E[g(X)]} \leq \color{darkgreen}{g(E[X])}$$` ] ] ] --- class: middle .content-box-red[**Implication**] If the underlying data-generating process is nonlinear (e.g., the impact of precipitation on crop yield), aggregation (field-year level to county-level data) might mask the true relationship.